Skip to main content
Registration has reached capacity. Join the waitlist

All Accepted Papers

Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints

Jelena Markovic-Voronov (LinkedIn), Kayhan Behdin (LinkedIn), Yuanda Xu (LinkedIn), Zhengze Zhou (LinkedIn), Zhipeng Wang (LinkedIn), Rahul Mazumder (LinkedIn, MIT)

System Optimization & Efficiency Architectural Patterns & Composition

A batch-level LLM routing framework that jointly assigns models to an entire incoming request batch while respecting cost, GPU, and concurrency constraints—rather than routing each query independently. A robust variant explicitly accounts for uncertainty in predicted model quality, preventing adversarial or skewed batches from defeating cost controls.

Presentation

Talk

Paper Session 3: Systems Efficiency

Wednesday, May 27 · 3:30 PM – 3:40 PM

Bayshore Ballroom

Poster

Wednesday, May 27 · 5:15 PM – 6:45 PM

Carmel / Monterey

Abstract

We study the problem of routing queries to large language models (LLMs) under cost, GPU resources, and concurrency constraints. Prior per-query routing methods often fail to control batch-level cost, especially under non-uniform or adversarial batching. To address this, we propose a batch-level, resource-aware routing framework that jointly optimizes model assignment for each batch while respecting cost and model capacity limits. We further introduce a robust variant that accounts for uncertainty in predicted LLM performance, along with an offline instance allocation procedure that balances quality and throughput across multiple models. Experiments on two multi-task LLM benchmarks show that robustness improves accuracy by 1–14% over non-robust counterparts (depending on the performance estimator), batch-level routing outperforms per-query methods by up to 24% under adversarial batching, and optimized instance allocation yields additional gains of up to 3% compared to a non-optimized allocation, all while strictly controlling cost and GPU resource constraints.

ACM CAIS 2026 Sponsors